On Approximately Searching for Similar Word Embeddings
نویسندگان
چکیده
We discuss an approximate similarity search for word embeddings, which is an operation to approximately find embeddings close to a given vector. We compared several metric-based search algorithms with hash-, tree-, and graphbased indexing from different aspects. Our experimental results showed that a graph-based indexing exhibits robust performance and additionally provided useful information, e.g., vector normalization achieves an efficient search with cosine similarity.
منابع مشابه
Linguistic Regularities in Sparse and Explicit Word Representations
Recent work has shown that neuralembedded word representations capture many relational similarities, which can be recovered by means of vector arithmetic in the embedded space. We show that Mikolov et al.’s method of first adding and subtracting word vectors, and then searching for a word similar to the result, is equivalent to searching for a word that maximizes a linear combination of three p...
متن کاملTowards a Unified Framework for Transfer Learning: Exploiting Correlations and Symmetries
Recent work has shown that neuralembedded word representations capture many relational similarities, which can be recovered by means of vector arithmetic in the embedded space. We show that Mikolov et al.’s method of first adding and subtracting word vectors, and then searching for a word similar to the result, is equivalent to searching for a word that maximizes a linear combination of three p...
متن کاملMedical Incident Report Classification using Context-based Word Embeddings
The University Medical Center Groningen is one of the largest hospitals in The Netherlands, employing over 10.000 people. In a hospital of this size incidents are bound to occur on a regular basis. Most of these incidents are reported extensively, but the time consuming nature of analyzing their textual descriptions and the sheer number of reports make it costly to process them. Therefore, this...
متن کاملA Comparison of Word Embeddings for the Biomedical Natural Language Processing
Background Neural word embeddings have been widely used in biomedical Natural Language Processing (NLP) applications as they provide vector representations of words capturing the semantic properties of words and the linguistic relationship between words. Many biomedical applications use different textual resources (e.g., Wikipedia and biomedical articles) to train word embeddings and apply thes...
متن کاملIs deep learning really necessary for word embeddings?
Word embeddings resulting from neural language models have been shown to be successful for a large variety of NLP tasks. However, such architecture might be difficult to train and time-consuming. Instead, we propose to drastically simplify the word embeddings computation through a Hellinger PCA of the word co-occurence matrix. We compare those new word embeddings with some wellknown embeddings ...
متن کامل